fix: browse() and list_datablocks() for V3 multi-frame EXPLORE (S7-1200 FW V4.5) by tommasofaedo · Pull Request #753 · gijzelaerr/python-snap7

tommasofaedo · 2026-06-19T06:06:49Z

## Problem

On S7-1200 firmware V4.5 (V3 protocol), `list_datablocks()` and `browse()`
return empty results because the EXPLORE response for `0x8A11FFFF` spans
**multiple TPKT frames** and uses a different encoding than V1/V2 PLCs.

Three separate issues:

1. **Multi-frame response not collected** — the EXPLORE response is split into
   3 consecutive TPKT frames. The second and third frames are raw BLOB
   continuation data with no response header (only a V3 HMAC prefix). The
   existing code only reads the first frame.

2. **Wrong parser for V3 EXPLORE** — V3 PLCs return a zlib-compressed
   `PlcContentInfo` XML blob (magic `78 DA`), not the PObject tree that
   `_parse_explore_datablocks()` expects.

3. **`_parse_explore_fields` crashes on V3 attributes** — three bugs when
   parsing EXPLORE responses from V3 PLCs:
   - WSTRING dtype `0x15` not recognised (only `0x13` was checked)
   - Strings decoded as UTF-16-BE instead of UTF-8
   - BLOB skip logic misses an extra `0x00` byte that V3 PLCs insert before
     the VLQ length; WSTRING skip was also missing the data bytes

## Changes

### `s7/connection.py` — add `_collect_explore_frames()`

New method on `S7CommPlusConnection` that collects all continuation frames
after the first EXPLORE response frame.  Detection: a frame whose body
(after HMAC strip) is smaller than the reference size by more than 5 bytes
is the last fragment.

### `s7/_s7commplus_client.py`

- **`_parse_explore_datablocks_xml()`** — new parser that finds the `78 DA`
  zlib magic in the concatenated response, decompresses it, and extracts
  `Entity[@Id="Block" Header[@Type="DB"]]` nodes from the `PlcContentInfo`
  XML.  Falls back to `_parse_explore_datablocks()` if no zlib magic is
  found (backward compatible with V1/V2 PLCs).

- **`list_datablocks()`** — when `_session_key is not None` (V3 PLCs),
  builds the `0x8A11FFFF` EXPLORE payload, calls `_collect_explore_frames()`
  to gather all frames, then calls `_parse_explore_datablocks_xml()`.

- **`browse()`** — calls `_collect_explore_frames()` for each per-DB EXPLORE
  on V3 connections.

- **`_parse_explore_fields()`** — fixes for V3 PLCs:
  - Accept dtype `0x15` (WSTRING) in addition to `0x13` for name attributes
  - Decode name strings as UTF-8 (not UTF-16-BE)
  - BLOB skip: add 1 byte for the extra `0x00` before VLQ length
  - WSTRING skip: include `str_len` bytes after the VLQ

## Tested on

- **PLC:** Siemens S7-1200 CPU 1212C DC/DC/DC
- **Firmware:** V4.5
- **Protocol:** V3 (no TLS, no password)

`list_datablocks()` now correctly returns `[{"name": "Data_block_1",
"number": 100, "rid": 2316173412}]` where it previously returned `[]`.

## Known limitation (documented, not fixed)

On FW V4.5, DB field definitions and I/Q/M tag names are stored in zlib
BLOBs with a Siemens preset dictionary (magic `78 7D`, FDICT flag set, dict
checksum `58 14 B0 3B`).  Python's `zlib.decompress()` returns
`Z_NEED_DICT` — the preset dictionary is embedded in TIA Portal and has not
been published by Siemens.

As a result, `browse()` returns DB names and numbers but cannot enumerate
individual field names on V3 PLCs.  This is a protocol-level constraint,
not a code bug.

@id

…00 FW V4.5) On V3 PLCs (FW >= V4.5) the EXPLORE response for RID 0x8A11FFFF spans multiple TPKT frames and uses a zlib-compressed PlcContentInfo XML format instead of the PObject tree expected by _parse_explore_datablocks(). The existing reassemble=True path does not strip V3 HMAC prefixes from continuation frames, so list_datablocks() returned [] on these PLCs. Changes: connection.py: - Add collect_explore_frames(): collects V3 multi-fragment EXPLORE responses by receiving continuation frames and stripping their HMAC prefix, stopping when a shorter-than-reference frame is detected. _s7commplus_client.py: - Add _build_explore_payload_v3(): VLQ-encoded EXPLORE payload for V3 PLCs (required format for 0x8A11FFFF and per-DB RID explores). - Add _parse_explore_datablocks_xml(): decompresses the zlib PlcContentInfo XML blob and extracts Entity[@id="Block"][@type="DB"] entries; falls back to _parse_explore_datablocks() when no zlib magic is found. - list_datablocks(): when protocol_version >= V3, use _build_explore_payload_v3 + collect_explore_frames + _parse_explore_datablocks_xml. - browse(): when protocol_version >= V3, use V3 payload builder and frame collector for each per-DB EXPLORE. - _parse_explore_fields(): three fixes for V3 PLCs: * Accept WSTRING dtype 0x15 in addition to 0x13 for name attributes. * Auto-detect encoding: UTF-8 (V3, no null bytes) vs UTF-16-BE (V1/V2). * BLOB skip: account for the extra 0x00 byte V3 PLCs insert before VLQ len. * WSTRING skip: advance past string data bytes (was only skipping VLQ). Tested on S7-1200 CPU 1212C DC/DC/DC, firmware V4.5 (V3 protocol, no TLS): - list_datablocks() now returns [{"name": "Data_block_1", "number": 100, "rid": 2316173412}] where it previously returned []. - The PlcContentInfo XML (6131 bytes after decompression) is correctly parsed from a 3-frame response (first 946-byte frame + two continuations). Known limitation: on FW V4.5, DB field definitions and I/Q/M tag names are stored in zlib BLOBs with a Siemens preset dictionary (magic 78 7D, FDICT flag set). Python zlib.decompress() returns Z_NEED_DICT. browse() returns DB names/numbers but cannot enumerate individual field names on V3 PLCs.

gijzelaerr

Review Summary

This PR adds V3 (S7-1200 FW V4.5) support for list_datablocks() and browse() — three distinct fixes for multi-frame collection, zlib-compressed XML parsing, and _parse_explore_fields V3 attribute encoding. Real-hardware tested. No malicious code.

Issues to address:

1. No unit tests. This is the biggest gap. The XML parser, the multi-frame collector, and the _parse_explore_fields fixes all have zero test coverage. At minimum: a test for _parse_explore_datablocks_xml with a synthetic zlib-compressed XML blob, and a test for the WSTRING/BLOB skip fixes.

2. XML entity expansion (XXE). ET.fromstring() uses the default parser which resolves external entities. Since the XML comes from a PLC (not user input), the risk is low, but for defense-in-depth consider defusedxml or at least ET.XMLParser(resolve_entities=False) — a malicious response could trigger entity expansion.

3. collect_explore_frames fragment detection is fragile. The "body shorter than reference by >5 bytes = last fragment" heuristic assumes all full-size frames are within 5 bytes of each other. If the PLC sends a legitimately short intermediate frame, it would be misdetected as the last. A more robust approach: use the V3 protocol's own termination signal (if available) or at least add a max-frame-count guard to prevent infinite loops.

4. collect_explore_frames has no size/count limits. A malformed V3 response could drive unbounded memory allocation. Add caps similar to _recv_reassembled_payload (16 MiB / 4096 fragments).

5. _build_explore_payload_v3 uses VLQ for ExploreId. The existing _build_explore_request was just fixed (in #749) to use fixed UInt32 for ExploreId because that's what real PLCs expect. Using VLQ here may work for V3 but diverges from the corrected convention — is this intentional?

6. BLOB skip offset += 1 for the extra 0x00 byte is V3-specific but runs unconditionally in _parse_explore_fields. If a V1/V2 response has a BLOB attribute, this would skip one byte too many. Guard it behind a V3 check.

7. Async parity. S7CommPlusAsyncClient.list_datablocks() and browse() are not updated — async callers on V3 PLCs still get empty results.

Positive notes:

Uses xml.etree.ElementTree (safe, stdlib)
V1/V2 fallback path preserved correctly
The zlib magic detection (78 DA) is sound
No legacy snap7/ files touched

Not ready to merge — needs tests, size caps, and the BLOB skip V3 guard.

gijzelaerr · 2026-06-19T15:04:48Z

+        xml_bytes = zlib.decompress(response[zlib_pos:])
+    except zlib.error as exc:
+        logger.debug(f"_parse_explore_datablocks_xml: zlib error {exc}")
+        return []


ET.fromstring() uses the default XML parser which resolves external entities. While the XML comes from a PLC, for defense-in-depth consider at minimum disabling entity resolution. Python 3.8+ ET.fromstring is safe against XXE by default (entities are not expanded), so this is low-risk — but worth a comment noting the assumption.

gijzelaerr · 2026-06-19T15:04:48Z

+                # V3 non-TLS: strip the HMAC prefix ([hash_len][hash_bytes])
+                if self._protocol_version >= ProtocolVersion.V3 and len(body) > 33:
+                    hash_len = body[0]
+                    body = body[1 + hash_len :]


No size or fragment-count cap. A malformed V3 response could loop indefinitely and allocate unbounded memory. Add limits similar to _recv_reassembled_payload (_MAX_REASSEMBLED_FRAGMENTS / _MAX_REASSEMBLED_BYTES).

Also, the "body shorter than reference by >5 bytes" heuristic is fragile — if the PLC sends a legitimately shorter intermediate frame, collection stops early and silently truncates the response.

gijzelaerr · 2026-06-19T15:04:48Z

                    break
                count, consumed = _vlq32(response, offset)
                offset += consumed
                offset += count


The offset += 1 for the extra 0x00 byte before BLOB VLQ length is V3-specific, but this code runs for all protocol versions. If a V1/V2 EXPLORE response contains a BLOB attribute, this will skip one byte too many and corrupt all subsequent parsing. Guard with a V3 check, or pass the protocol version into this function.

…n to collect_explore_frames - Add fragment count and byte size caps using _MAX_REASSEMBLED_FRAGMENTS / _MAX_REASSEMBLED_BYTES (same limits already used by _recv_reassembled_payload) - Add frag_len == 0 check as primary end-of-stream trailer detection - Keep shorter-than-full-frame heuristic as fallback - Update docstring to document termination and safety limits

… comment, pass protocol_version - _parse_explore_fields: add protocol_version param (default 0 = backward compat) Guard the BLOB extra-0x00 skip with `if protocol_version >= ProtocolVersion.V3` so V1/V2 EXPLORE responses with BLOB attributes are not mis-parsed - browse(): pass self._connection._protocol_version to _parse_explore_fields - _parse_explore_datablocks_xml: add comment noting ET.fromstring XXE safety assumption (safe by default in Python 3.8+, XML source is the PLC)

tommasofaedo · 2026-06-29T14:31:06Z

Thank you for the detailed review! I've pushed two commits addressing all three points.

1. collect_explore_frames() — unbounded loop / fragile heuristic

Addressed in commit 906c877:

Added fragment_count counter + cap check using the existing _MAX_REASSEMBLED_FRAGMENTS / _MAX_REASSEMBLED_BYTES constants (same limits as _recv_reassembled_payload). A malformed response now raises S7ConnectionError instead of looping indefinitely.
Added if frag_len == 0: break as the primary termination criterion — matches the standard S7CommPlus end-of-stream trailer (0x72 ver 0x00 0x00) already used in _recv_reassembled_payload.
Kept the "shorter-than-full-frame" heuristic as a fallback in case the trailer is absent (as observed on some V3 captures). Updated docstring to document both paths.

2. _parse_explore_fields() — V3-specific offset += 1 runs for all protocol versions

Addressed in commit d1fd133:

Added protocol_version: int = 0 parameter (default 0 = V1/unknown, backward compatible).
Guarded offset += 1 with if protocol_version >= ProtocolVersion.V3:.
Updated browse() to pass self._connection._protocol_version.

3. ET.fromstring() XXE safety note

Addressed in commit d1fd133: added inline comment noting that ET.fromstring is safe against XXE by default in Python 3.8+ and that the XML source is the PLC (trusted local network device).

Let me know if anything else needs adjustment.

gijzelaerr · 2026-07-01T08:33:01Z

Thanks for the thorough follow-up — all three points look well-addressed:

Frame caps + frag_len == 0 primary termination — much better. Using the existing _MAX_REASSEMBLED_* constants and keeping the shorter-frame heuristic as a fallback (rather than primary) is the right call.
V3-guarded offset += 1 — passing protocol_version into _parse_explore_fields and guarding the extra byte cleanly fixes the V1/V2 regression risk.
XXE comment — good to have the explicit note.

Two remaining items from the original review that are still open:

No unit tests — the PR adds ~170 lines of logic (XML parsing, multi-frame collection, V3 BLOB handling) with zero test coverage. Even a basic test for _parse_explore_datablocks_xml with a synthetic zlib-compressed XML blob would catch regressions.
No async parity — async browse() / list_datablocks() don't get the V3 path. Not necessarily a blocker for V1 of this PR, but worth noting.

gijzelaerr reviewed Jun 19, 2026

View reviewed changes

tommasofaedo added 2 commits June 29, 2026 16:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: browse() and list_datablocks() for V3 multi-frame EXPLORE (S7-1200 FW V4.5)#753

fix: browse() and list_datablocks() for V3 multi-frame EXPLORE (S7-1200 FW V4.5)#753
tommasofaedo wants to merge 3 commits into
gijzelaerr:masterfrom
tommasofaedo:fix/browse-v3-multiframe-explore

tommasofaedo commented Jun 19, 2026

Uh oh!

gijzelaerr left a comment

Uh oh!

gijzelaerr Jun 19, 2026

Uh oh!

gijzelaerr Jun 19, 2026

Uh oh!

gijzelaerr Jun 19, 2026

Uh oh!

tommasofaedo commented Jun 29, 2026 •

edited

Loading

Uh oh!

gijzelaerr commented Jul 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

tommasofaedo commented Jun 19, 2026

Uh oh!

gijzelaerr left a comment

Choose a reason for hiding this comment

Review Summary

Issues to address:

Positive notes:

Uh oh!

gijzelaerr Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

gijzelaerr Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

gijzelaerr Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

tommasofaedo commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gijzelaerr commented Jul 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tommasofaedo commented Jun 29, 2026 •

edited

Loading